Arborest – a Growing Treebank of Estonian
نویسندگان
چکیده
Treebank creation is a very labor-consuming task, especially if the applications intended include machine learning, gold standard parser evaluation or teaching, since only a manually checked syntactically annotated corpus can provide optimal support for these purposes. There are, however, possibilities to make the annotation process (partly) automatic, saving (manual) annotation time and/or allowing the creation of larger corpora. Whenever possible, existing resources – both corpora and grammars – should be reused.
منابع مشابه
Arborest – a VISL-Style Treebank Derived from an Estonian Constraint Grammar Corpus
Treebank creation is a very labor-consuming task, especially if the applications intended include machine learning, gold standard parser evaluation or teaching, since only a manually checked syntactically annotated corpus can provide optimal support for these purposes. There are, however, possibilities to make the annotation process (partly) automatic, saving (manual) annotation time and/or all...
متن کاملEstonian Copular and Existential Constructions as an UD Annotation Problem
This article is about annotating clauses with nonverbal predication in version 2 of Estonian UD treebank. Three possible annotation schemas are discussed, among which separating existential clauses from copular clauses would be theoretically most sound but would need too much manual labor and could possibly yield inconcistent annotation. Therefore, a solution has been adapted which separates ex...
متن کاملEstonian Dependency Treebank: from Constraint Grammar tagset to Universal Dependencies
This paper presents the first version of Estonian Universal Dependencies Treebank which has been semi-automatically acquired from Estonian Dependency Treebank and comprises ca 400,000 words (ca 30,000 sentences) representing the genres of fiction, newspapers and scientific writing. Article analyses the differences between two annotation schemes and the conversion procedure to Universal Dependen...
متن کاملSyntactically annotated corpora of Estonian
Syntactically annotated corpora are needed 1) to train and test parsers and various language technological products grammar checkers, information retrievers and extractors, machine translators etc; 2) to check the agreement of existing linguistic theories with the real language usage. The corpora can be annotated on different levels of depth. In shallow syntactically annotated corpora a syntact...
متن کاملStudy of the effect of Estonian and aqueous extract of Persian walnut tree leaf (Juglans regia) on growth indicators in western white shrimp farmed (Litopenaeus vannamei)
The aim of this study was to investigate the effects of Estonian and aqueous extracts of Persian walnut leaves on the performance of growth indices in western white shrimp (Litopenaeus vannamei). Materials and methods included 6 treatments of shrimp with different concentrations of 100, 200 and 300 mg/kg aqueous and Estonian extracts of Persian walnut leaves in the diet and 2 negative control t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004